perceptual representation
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
PALMER: Perception - Action Loop with Memory for Long-Horizon Planning
To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them.
LangNav: Language as a Perceptual Representation for Navigation
Pan, Bowen, Panda, Rameswar, Jin, SouYoung, Feris, Rogerio, Oliva, Aude, Isola, Phillip, Kim, Yoon
We explore the use of language as a perceptual representation for vision-and-language navigation. Our approach uses off-the-shelf vision systems (for image captioning and object detection) to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, based on the current view and the trajectory history, that would best fulfill the navigation instructions. In contrast to the standard setup which adapts a pretrained language model to work directly with continuous visual features from pretrained vision models, our approach instead uses (discrete) language as the perceptual representation. We explore two use cases of our language-based navigation (LangNav) approach on the R2R vision-and-language navigation benchmark: generating synthetic trajectories from a prompted large language model (GPT-4) with which to finetune a smaller language model; and sim-to-real transfer where we transfer a policy learned on a simulated environment (ALFRED) to a real-world environment (R2R). Our approach is found to improve upon strong baselines that rely on visual features in settings where only a few gold trajectories (10-100) are available, demonstrating the potential of using language as a perceptual representation for navigation tasks.
Biologically Plausible Learning Rules for Perceptual Systems that Maximize Mutual Information
Consider a neural perceptual system being exposed to an external environment. The system has certain internal state to represent external events. There is strong behavioral and neural evidence(e.g., Ernst and Banks, 2002; Gabbiani and Koch, 1998) that the internal representation is intrinsically probabilistic(Knill and Pouget, 2004), in line with the statistical properties of the environment. We mark the input signal as x. The perceptual representation would be a probability distribution conditional on x, denoted as p(y x). According to the Infomax principle (Attneave, 1954; Barlow et al., 1961; Linsker, 1988), the system's goal is to maximize the mutual information (MI) between the input x and the output (neuronal response) y, which can be written as max I(x;y), (1.1)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Asia > Middle East > Jordan (0.04)
An Information-Theoretic Framework for Understanding Saccadic Eye Movements
In this paper, we propose that information maximization can provide aunified framework for understanding saccadic eye movements. Inthis framework, the mutual information among the cortical representations of the retinal image, the priors constructed from our long term visual experience, and a dynamic short-term internal representation constructed from recent saccades provides a map for guiding eye navigation. By directing the eyes to locations ofmaximum complexity in neuronal ensemble responses at each step, the automatic saccadic eye movement system greedily collects information about the external world, while modifying the neural representations in the process. This framework attempts to connect several psychological phenomena, such as pop-out and inhibition of return, to long term visual experience and short term working memory. It also provides an interesting perspective on contextual computation and formation of neural representation in the visual system. 1 Introduction When we look at a painting or a visual scene, our eyes move around rapidly and constantly to look at different parts of the scene.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Asia > Middle East > Jordan (0.04)